Insights Core is a framework for collecting and processing data about systems. It allows users to write components that collect and transform sets of raw data into typed python objects, which can then be used in rules that encapsulate knowledge about them.
To accomplish this the framework uses an internal dependency engine. Components in the form of class or function definitions declare dependencies on other components with decorators, and the resulting graphs can be executed once all components you care about have been loaded.
This is an introduction to the dependency system followed by a summary of the standard components Insights Core provides.
In [1]:
import sys
sys.path.insert(0, "../..")
from insights.core import dr
In [2]:
# Here's our component type with the clever name "component."
# Insights Core provides several types that we'll come to later.
class component(dr.ComponentType):
pass
In [3]:
import random
# Make two components with no dependencies
@component()
def rand():
return random.random()
@component()
def three():
return 3
# Make a component that depends on the other two. Notice that we depend on two
# things, and there are two arguments to the function.
@component(rand, three)
def mul_things(x, y):
return x * y
In [4]:
# Now that we have a few components defined, let's run them.
from pprint import pprint
# If you call run with no arguments, all components of every type (with a few caveats
# I'll address later) are run, and their values or exceptions are collected in an
# object called a broker. The broker is like a fancy dictionary that keeps up with
# the state of an evaluation.
broker = dr.run()
pprint(broker.instances)
In [5]:
class stage(dr.ComponentType):
pass
In [6]:
@stage(mul_things)
def spam(m):
return int(m)
In [7]:
broker = dr.run()
print "All Instances"
pprint(broker.instances)
print
print "Components"
pprint(broker.get_by_type(component))
print
print "Stages"
pprint(broker.get_by_type(stage))
In [8]:
class thing(dr.ComponentType):
def invoke(self, broker):
return self.component(broker)
@thing(rand, three)
def stuff(broker):
r = broker[rand]
t = broker[three]
return r + t
In [9]:
broker = dr.run()
print broker[stuff]
Notice that broker can be used as a dictionary to get the value of components that have already executed without directly looking at the broker.instances
attribute.
When a component raises an exception, the exception is recorded in a dictionary whose key is the component and whose value is a list of exceptions. The traceback related to each exception is recorded in a dictionary of exceptions to tracebacks. We record exceptions in a list because some components may generate more than one value. We'll come to that later.
In [10]:
@stage()
def boom():
raise Exception("Boom!")
broker = dr.run()
e = broker.exceptions[boom][0]
t = broker.tracebacks[e]
pprint(e)
print
print t
A component with any missing required dependencies will not be called. Missing dependencies are recorded in the broker in a dictionary whose keys are components and whose values are tuples with two values. The first is a list of all missing required dependencies. The second is a list of all dependencies of which at least one was required.
In [11]:
@stage("where's my stuff at?")
def missing_stuff(s):
return s
broker = dr.run()
print broker.missing_requirements[missing_stuff]
In [12]:
@stage("a", "b", [rand, "d"], ["e", "f"])
def missing_more_stuff(a, b, c, d, e, f):
return a + b + c + d + e + f
broker = dr.run()
print broker.missing_requirements[missing_more_stuff]
Notice that the first elements in the dependency list after @stage
are simply "a" and "b", but the next two elements are themselves lists. This means that at least one element of each list must be present. The first "any" list has [rand, "d"], and rand is available, so it resolves. However, neither "e" nor "f" are available, so the resolution fails. Our missing dependencies list includes the first two standalone elements as well as the second "any" list.
In [13]:
@stage(rand, optional=['test'])
def is_greater_than_ten(r, t):
return (int(r*10.0) < 5.0, t)
broker = dr.run()
print broker[is_greater_than_ten]
The definition of a component type may include requires
and optional
attributes. Their specifications are the same as the requires
and optional
portions of the component decorators. Any component decorated with a component type that has requires
or optional
in the class definition will automatically depend on the specified components, and any additional dependencies on the component itself will just be appended.
This functionality should almost never be used because it makes it impossible to tell that the component has implied dependencies.
In [14]:
class mything(dr.ComponentType):
requires = [rand]
@mything()
def dothings(r):
return 4 * r
broker = dr.run(broker=broker)
pprint(broker[dothings])
pprint(dr.get_dependencies(dothings))
In [15]:
class anotherthing(dr.ComponentType):
metadata={"a": 3}
@anotherthing(metadata={"b": 4, "c": 5})
def four():
return 4
dr.get_metadata(four)
Out[15]:
So far we haven't said how we might group components together outside of defining different component types. But sometimes we might want to specify certain components, even of different component types, to belong together and to only be executed when explicitly asked to do so.
All of our components so far have implicitly belonged to the default group. However, component types and even individual components can be assigned to specific groups, which will run only when specified.
In [16]:
class grouped(dr.ComponentType):
group = "grouped"
@grouped()
def five():
return 5
b = dr.Broker()
dr.run(dr.COMPONENTS["grouped"], broker=b)
pprint(b.instances)
If a group isn't specified in the type definition or in the component decorator, the default group is assumed. Likewise, the default group is assumed when calling run
if one isn't provided.
It's also possible to override the group of an individual component by using the group
keyword in its decorator.
Since hundreds or even thousands of dependencies can be defined, it's sometimes useful to separate them into graphs that don't share any components and execute those graphs one at a time. In addition to the run
function, the dr
module provides a run_incremental
function that does exactly that. You can give it a starting broker (or none at all), and it will yield a new broker for each distinct graph among all the dependencies.
The run_all
function is similar to run_incremental
since it breaks a graph up into independently executable subgraphs before running them. However, it returns a list of the brokers instead of yielding one at a time. It also has a pool
keyword argument that accepts a concurrent.futures.ThreadPoolExecutor
, which it will use to run the independent subgraphs in parallel. This can provide a significant performance boost in some situtations.
In [17]:
from insights.core import dr
@stage()
def six():
return 6
@stage(six)
def times_two(x):
return x * 2
# If the component's full name was foo.bar.baz.six, this would print "baz"
print "\nModule (times_two):", dr.get_base_module_name(times_two)
print "\nComponent Type (times_two):", dr.get_component_type(times_two)
print "\nDependencies (times_two): "
pprint(dr.get_dependencies(times_two))
print "\nDependency Graph (stuff): "
pprint(dr.get_dependency_graph(stuff))
print "\nDependents (rand): "
pprint(dr.get_dependents(rand))
print "\nGroup (six):", dr.get_group(six)
print "\nMetadata (four): ",
pprint(dr.get_metadata(four))
# prints the full module name of the component
print "\nModule Name (times_two):", dr.get_module_name(times_two)
# prints the module name joined to the component name by a "."
print "\nName (times_two):", dr.get_name(times_two)
print "\nSimple Name (times_two):", dr.get_simple_name(times_two)
If you have components defined in a package and the root of that path is in sys.path
, you can load the package and all its subpackages and modules by calling dr.load_components
. This way you don't have to load every component module individually.
# recursively load all packages and modules in path.to.package
dr.load_components("path.to.package")
# or load a single module
dr.load_components("path.to.package.module")
Now that you know the basics of Insights Core dependency resolution, let's move on to the rest of Core that builds on it.
The standard component types provided by Insights Core are datasource
, parser
, combiner
, rule
, condition
, and incident
. They're defined in insights.core.plugins
.
Some have specialized interfaces and executors that adapt the dependency specification parts described above to what developers using previous versions of Insights Core have come to expect.
For more information on parser, combiner, and rule development, please see our component developer tutorials.
A datasource used to be called a spec. Components of this type collect data and make it available to other components. Since we have several hundred predefined datasources that fall into just a handful of categories, we've streamlined the process of creating them.
Datasources are defined either with the @datasource
decorator or with helper functions from insights.core.spec_factory
.
The spec_factory
module has a handful of functions for defining common datasource types.
All datasources defined helper functions will depend on a ExecutionContext
of some kind. Contexts let you activate different datasources for different environments. Most of them provide a root path for file collection and may perform some environment specific setup for commands, even modifying the command strings if needed.
For now, we'll use a HostContext
. This tells datasources to collect files starting at the root of the file system and to execute commands exactly as they are defined. Other contexts are in insights.core.contexts
.
All file collection datasources depend on any context that provides a path to use as root unless a particular context is specified. In other words, some datasources will activate for multiple contexts unless told otherwise.
In [18]:
from insights.core import dr
from insights.core.context import HostContext
from insights.core.spec_factory import (simple_file,
glob_file,
simple_command,
listdir,
foreach_execute,
foreach_collect,
first_file,
first_of)
release = simple_file("/etc/redhat-release")
hostname = simple_file("/etc/hostname")
ctx = HostContext()
broker = dr.Broker()
broker[HostContext] = ctx
broker = dr.run(broker=broker)
print broker[release].path, broker[release].content
print broker[hostname].path, broker[hostname].content
glob_file
accepts glob patterns and evaluates at runtime to a list of TextFileProvider
instances, one for each match. You can pass glob_file
a single pattern or a list (or set) of patterns. It also accepts an ignore
keyword, which should be a regular expression string matching paths to ignore. The glob and ignore patterns can be used together to match lots of files and then throw out the ones you don't want.
In [19]:
host_stuff = glob_file("/etc/host*", ignore="(allow|deny)")
broker = dr.run(broker=broker)
print broker[host_stuff]
simple_command
allows you to get the results of a command that takes no arguments or for which you know all of the arguments up front.
It and other command datasources return a CommandOutputProvider
instance, which has the command string, any arguments interpolated into it (more later), the return code if you requested it via the keep_rc=True
keyword, and the command output as a list of lines.
simple_command
also accepts a timeout
keyword, which is the maximum number of seconds the system should attempt to execute the command before a CalledProcessError
is raised for the component.
A default timeout for all commands can be set on the initial ExecutionContext
instance with the timeout
keyword argument.
If a timeout isn't specified in the ExecutionContext
or on the command itself, none is used.
In [20]:
uptime = simple_command("/usr/bin/uptime")
broker = dr.run(broker=broker)
print (broker[uptime].cmd, broker[uptime].args, broker[uptime].rc, broker[uptime].content)
In [21]:
interfaces = listdir("/sys/class/net")
broker = dr.run(broker=broker)
pprint(broker[interfaces])
foreach_execute
allows you to use output from one component as input to a datasource command string. For example, using the output of the interfaces datasource above, we can get ethtool information about all of the ethernet devices.
The timeout description provided in the simple_command
section applies here to each seperate invocation.
In [22]:
ethtool = foreach_execute(interfaces, "ethtool %s")
broker = dr.run(broker=broker)
pprint(broker[ethtool])
Notice each element in the list returned by interfaces
is a single string. The system interpolates each element into the ethtool
command string and evaluates each result. This produces a list of objects, one for each input element, instead of a single object. If the list created by interfaces
contained tuples with n
elements, then our command string would have had n
substitution parameters.
first_of
is a way to express that you want to use any datasource from a list of datasources you've already defined. This is helpful if the way you collect data differs in different contexts, but the output is the same.
For example, the way you collect installed rpms directly from a machine differs from how you would collect them from a docker image. Ultimately, downstream components don't care: they just want rpm data.
You could do the following. Notice that host_rpms
and docker_installed_rpms
implement different ways of getting rpm data that depend on different contexts, but the final installed_rpms
datasource just references whichever one ran.
In [23]:
from insights.specs.default import format_rpm
from insights.core.context import DockerImageContext
from insights.core.plugins import datasource
from insights.core.spec_factory import CommandOutputProvider
rpm_format = format_rpm()
cmd = "/usr/bin/rpm -qa --qf '%s'" % rpm_format
host_rpms = simple_command(cmd, context=HostContext)
@datasource(DockerImageContext)
def docker_installed_rpms(ctx):
root = ctx.root
cmd = "/usr/bin/rpm -qa --root %s --qf '%s'" % (root, rpm_format)
result = ctx.shell_out(cmd)
return CommandOutputProvider(cmd, ctx, content=result)
installed_rpms = first_of([host_rpms, docker_installed_rpms])
broker = dr.run(broker=broker)
pprint(broker[installed_rpms])
In [24]:
from insights.core import Parser
from insights.core.plugins import parser
@parser(hostname)
class HostnameParser(Parser):
def parse_content(self, content):
self.host, _, self.domain = content[0].partition(".")
broker = dr.run(broker=broker)
print "Host:", broker[HostnameParser].host
Notice that the parser
decorator accepts only one argument, the datasource the component needs. Also notice that our parser has a sensible default constructor that accepts a datasource and passes its content into a parse_content function.
Our hostname parser is pretty simple, but it's easy to see how parsing things like rpm data or configuration files could get complicated.
Speaking of rpms, hopefully it's also easy to see that an rpm parser could depend on our installed_rpms definition in the previous section and parse the content regardless of where the content originated.
Not only do parsers have a special decorator, they also have a special executor. If the datasource is a list, the executor will attempt to construct a parser object with each element of the list, and the value of the parser in the broker will be the list of parser objects. It's important to keep this in mind when developing components that depend on parsers.
This is also why exceptions raised by components are stored as lists by component instead of single values.
Here's a simple parser that depends on the ethtool
datasource.
In [25]:
@parser(ethtool)
class Ethtool(Parser):
def parse_content(self, content):
self.link_detected = None
self.device = None
for line in content:
if "Settings for" in line:
self.device = line.split(" ")[-1].strip(":")
if "Link detected" in line:
self.link_detected = line.split(":")[-1].strip()
broker = dr.run(broker=broker)
for eth in broker[Ethtool]:
print "Device:", eth.device
print "Link? :", eth.link_detected, "\n"
We provide curated parsers for all of our datasources. They're in insights.parsers
.
Combiners depend on two or more other components. They typically are used to standardize interfaces or to provide a higher-level view of some set of components.
As an example of standardizing interfaces, chkconfig
and service
commands can be used to retrieve similar data about service status, but the command you run to check that status depends on your operating system version. A datasource would be defined for each command along with a parser to interpret its output. However, a downstream component may just care about a service's status, not about how a particular program exposes it. A combiner can depend on both chkconfig
and service
parsers (like this, so only one of them is required: @combiner([[chkconfig, service]])
) and provide a unified interface to the data.
As an example of a higher level view of several related components, imagine a combiner that depends on various ethtool and other network information gathering parsers. It can compile all of that information behind one view, exposing a range of information about devices, interfaces, iptables, etc. that might otherwise be scattered across a system.
We provide a few common combiners. They're in insights.combiners
.
Here's an example combiner
that tries a few different ways to determine the Red Hat release information. Notice that its dependency declarations and interface are just like we've discussed before. If this was a class, the __init__
function would be declared like def __init__(self, rh_release, un)
.
from collections import namedtuple
from insights.core.plugins import combiner
from insights.parsers.redhat_release import RedhatRelease as rht_release
from insights.parsers.uname import Uname
@combiner([rht_release, Uname])
def redhat_release(rh_release, un):
if un and un.release_tuple[0] != -1:
return Release(*un.release_tuple)
if rh_release:
return Release(rh_release.major, rh_release.minor)
raise Exception("Unabled to determine release.")
Rules depend on parsers and/or combiners and encapsulate particular policies about their state. For example, a rule might detect whether a defective rpm is installed. It might also inspect the lsof
parser to determine if a process is using a file from that defective rpm. It could also check network information to see if the process is a server and whether it's bound to an internal or external IP address. Rules can check for anything you can surface in a parser
or a combiner
.
Rules use the make_fail
, make_pass
, or make_info
helpers to create their return values. They take one required parameter, which is a key identifying the particular state the rule wants to highlight, and any number of required parameters that provide context for that state.
In [26]:
from insights.core.plugins import rule, make_fail, make_pass
ERROR_KEY = "IS_LOCALHOST"
@rule(HostnameParser)
def report(hn):
return make_pass(ERROR_KEY) if "localhost" in hn.host else make_fail(ERROR_KEY)
brok = dr.Broker()
brok[HostContext] = HostContext()
brok = dr.run(broker=brok)
pprint(brok.get(report))
Conditions and incidents are optional components that can be used by rules to encapsulate particular pieces of logic.
Conditions are questions with answers that can be interpreted as True or False. For example, a condition might be "Does the kdump configuration contain a 'net' target type?" or "Is the operating system Red Hat Enterprise Linux 7?"
Incidents, on the other hand, typically are specific types of warning or error messages from log type files.
Why would you use conditions or incidents instead of just writing the logic directly into the rule? Future versions of Insights may allow automated analysis of rules and their conditions and incidents. You will be able to tell which conditions, incidents, and rule firings across all rules correspond with each other and how strongly. This feature will become more powerful as conditions and incidents are written independently of explicit rules.
Insights Core allows you to attach functions to component types, and they'll be called any time a component of that type is encountered. You can attach observer functions globally or to a particular broker.
Observers are called whether a component succeeds or not. They take the component and the broker right after the component is evaluated and so are able to ask the broker about values, exceptions, missing requirements, etc.
In [27]:
def observer(c, broker):
if c not in broker:
return
value = broker[c]
pprint(value)
broker.add_observer(observer, component_type=parser)
broker = dr.run(broker=broker)